Bioinformatics (Thomas Dandekar, Meik Kunz)

131

from the observed sequences how probable which mutation is at which position, i.e. com

pile a very large table (more precisely: a matrix of transition probabilities) and then calcu

late the phylogenetic tree. This third method is particularly computationally intensive and

time-consuming, but of course particularly accurate.

In practice, it is still important that the faster methods are also more easily off the mark

when things get complicated. Depending on the calculation rule used, the result is more or

less easily falsified. This happens especially when sequences of different lengths are com

pared or when a single sequence is quite long (“long branch attraction”). The infobox

summarizes a number of tools.

Thus, bioinformatics enables us to describe evolution more precisely and to understand

important aspects of it by analysing many such phylogenetic trees, but also genomes and,

in particular, by taking a detailed look at individual gene families. In particular, by analys

ing the amino acid sequences involved, but also the available structural data of important

enzymes, it is possible to describe and analyse exactly how they function, which amino

acid residues are important for the chemical reaction they catalyse and which functional

subunits they consist of. These subunits are also known as protein domains. They are typi

cally 100–150 amino acids long, fold stably (hence their size – if they were longer they

would fold into multiple sections, if they were smaller they would not fold at all) and each

has a specific function. For example, there are catalytic domains, regulatory domains,

interaction domains, those that bind cofactors (often vitamins), and those that allow for a

solid structure in the protein (e.g. fibrils or fibers). Looking at protein families can shed

light on how a protein function changes or adapts across different organisms and how, for

example, additional mutations can turn a catalytic domain into a regulatory domain.

Phylogenetics Tools

Phylogeny

Family trees resemble real trees if there is a clear root (origin), for example by

including a distant species (“outgroup”).

Basically, there are three ways to calculate family trees:

• Always merge and calculate direct neighbours: neighbor joining. This can be

done quickly and is implemented excellently and efficiently in the CLUSTALW

software, for example.

• Parsimony tries to calculate the family tree with as few mutations as possible.

This is already more computationally expensive.

• Maximum likelihood considers the most computationally expensive procedure.

Each nucleotide exchange is considered according to its (often estimated) prob

ability and then the most probable phylogenetic tree is calculated.

10.4 Describing Evolution: Phylogenetic Trees